Cost-Saving Effect of Crowdsourcing Learning
نویسندگان
چکیده
Crowdsourcing is widely adopted in many domains as a popular paradigm to outsource work to individuals. In the machine learning community, crowdsourcing is commonly used as a cost-saving way to collect labels for training data. While a lot of effort has been spent on developing methods for inferring labels from a crowd, few work concentrates on the theoretical foundation of crowdsourcing learning. In this paper, we theoretically study the cost-saving effect of crowdsourcing learning, and present an upper bound for the minimally-sufficient number of crowd labels for effective crowdsourcing learning. Our results provide an understanding about how to allocate crowd labels efficiently, and are verified empirically.
منابع مشابه
Perform Three Data Mining Tasks with Crowdsourcing Process
For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...
متن کاملWinning by Learning? An Exploratory Study of Knowledge Sharing and Usage on Crowdsourcing Platforms
A crowdsourcing platform enables solution seekers to use contributions from a group of online users, usually by organizing crowdsourcing contests through which online users contribute to the solution generation process. Knowledge sharing on a crowdsourcing platform could play an important role in the process of crowdsourcing contests and contestants generating high-quality solutions. On one han...
متن کاملActive Learning with Amazon Mechanical Turk
Supervised classification needs large amounts of annotated training data that is expensive to create. Two approaches that reduce the cost of annotation are active learning and crowdsourcing. However, these two approaches have not been combined successfully to date. We evaluate the utility of active learning in crowdsourcing on two tasks, named entity recognition and sentiment detection, and sho...
متن کاملLearning and Feature Selection under Budget Constraints in Crowdsourcing
The cost of data acquisition limits the amount of labeled data available for machine learning algorithms, both at the training and the testing phase. This problem is further exacerbated in real-world crowdsourcing applications where labels are aggregated from multiple noisy answers. We tackle classification problems where the underlying feature labels are unknown to the algorithm and a (noisy) ...
متن کاملMentor: A Visualization and Quality Assurance Framework for Crowd-Sourced Data Generation
Crowdsourcing is a feasible method for collecting labeled datasets for training and evaluating machine learning models. Compared to the expensive process of generating labeled datasets using dedicated trained judges, the low cost of data generation in crowdsourcing environments enables researchers and practitioners to collect significantly larger amounts of data for the same cost. However, crow...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016